due to the large variance in the data, log() transformation is applied for better visualization

for the year 1962, calculate the correlation of 'CO2 emissions (metric tons per capita)' and gdpPercap.

after eliminating the nan values, calculate the correlation between co2 emission and gpd by year

based on the correlation matrix above, it tells 1967 is the year having the strongest correlation between CO2 emissions (metric tons per capita)' and gdpPercap

filter the dataframe for the year of strongest correlation

due to the large variance in the data, log() was used to transform data for better visualization.

Question1: What is the relationship between continent and 'Energy use (kg of oil equivalent per capita)'?

Answer: We used boxplots to visually represent the distribution of energy use across different continents.To determine whether there were statistically significant differences in energy use between the continents, we employed an ANOVA test, ANOVA was chosen because it is suitable for situations where there are multiple groups being compared, in this case we compared energy use across continents with more than two groups (e.g., Africa, Asia, Europe, etc.).

The results of the analysis revealed that there are indeed significant differences in energy use among the continents. This suggests that the continent factor has a significant effect on energy use, as the amount of energy used per capita varies significantly across different continents.

Question2:Is there a significant difference between Europe and Asia with respect to 'Imports of goods and services (% of GDP)' in the years after 1990? (Stats test needed)

To compare the 'Imports of goods and services (% of GDP)' between Europe and Asia in the years after 1990, we created a line plot to visually represent the trend over time for both continents. Additionally, a t-test was used to determine if there is a statistically significant difference in imports between Europe and Asia. T-test is typically used when comparing two groups on a single variable.

The results of the analysis revealed that there are no significant differences in Imports of goods and services (% of GDP)' between Europe and Asia in the years after 1990.

Question3: What is the country (or countries) that has the highest 'Population density (people per sq. km of land area)' across all years? (i.e., which country has the highest average ranking in this category across each time point in the dataset?)

Answer: To find the country (or countries) with the highest 'Population density (people per sq. km of land area)' across all years, we can create a bar chart to visualize the average ranking of population density for each country across all time points in the dataset. Monaco and Macao SAR, China have the highest 'Population density (people per sq. km of land area)' across all years.

Question4: What country (or countries) has shown the greatest increase in 'Life expectancy at birth, total (years)' between 1962 and 2007?

Answer: To find the country (or countries) that has shown the greatest increase in 'Life expectancy at birth, total (years)' between 1962 and 2007, we can calculate the difference in life expectancy between 2007 and 1962 for each country, and then sort the countries by this difference in descending order to identify the countries with the greatest increase. Maldives has shown the greatest increase in 'Life expectancy at birth, total (years)' between 1962 and 2007.

view html file https://nbviewer.org/github/Ding7928/skill-assessments/blob/main/Python%20for%20Data%20Science/python_for_data_science.html